The red wine industry shows a recent exponential growth as social drinking is on the rise. Nowadays, industryplayers are using product quality certifications to promote their products. This is a time-consuming processand requires the assessment given by human experts, which makes this process very expensive. Also, the price of red wine depends on a rather abstract concept of wine appreciation by wine tasters,opinions among whom may have a high degree of variability. Another vital factor in red wine certification andquality assessment is physicochemical tests, which are laboratory-based and consider factors like acidity, pHlevel, sugar, and other chemical properties. The red wine market would be of interest if the human quality oftasting can be related to wine’s chemical properties so that certification and quality assessment andassurance processes are more controlled. Determine which features are the best quality red wine indicators and generate insights into each of thesefactors to our model’s red wine quality. Predict the quality of wine on the basis of giving features. Deploy themodel.
import pandas as pd # for reading the file
import numpy as np # for numericals & mathematicals experssion
import seaborn as sns # for data visulization
import matplotlib.pyplot as plt # for data visulization
%matplotlib inline
import time
import warnings
warnings.filterwarnings('ignore')
from sklearn import metrics
import seaborn as sns
data = pd.read_csv("Wine_Quality_Predictionand_Deployment_TASK_6.csv")
data.head()
| fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
| 1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.9968 | 3.20 | 0.68 | 9.8 | 5 |
| 2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.9970 | 3.26 | 0.65 | 9.8 | 5 |
| 3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.9980 | 3.16 | 0.58 | 9.8 | 6 |
| 4 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
pip install pandas-profiling
Requirement already satisfied: pandas-profiling in c:\users\dell\anaconda3\lib\site-packages (3.1.0) Requirement already satisfied: matplotlib>=3.2.0 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (3.4.3) Requirement already satisfied: requests>=2.24.0 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (2.26.0) Requirement already satisfied: joblib~=1.0.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (1.0.1) Requirement already satisfied: phik>=0.11.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (0.12.0) Requirement already satisfied: htmlmin>=0.1.12 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (0.1.12) Requirement already satisfied: seaborn>=0.10.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (0.11.2) Requirement already satisfied: pydantic>=1.8.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (1.8.2) Requirement already satisfied: scipy>=1.4.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (1.7.1) Requirement already satisfied: tangled-up-in-unicode==0.1.0 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (0.1.0) Requirement already satisfied: visions[type_image_path]==0.7.4 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (0.7.4) Requirement already satisfied: pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (1.3.4) Requirement already satisfied: markupsafe~=2.0.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (2.0.1) Requirement already satisfied: multimethod>=1.4 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (1.6) Requirement already satisfied: numpy>=1.16.0 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (1.20.3) Requirement already satisfied: jinja2>=2.11.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (2.11.3) Requirement already satisfied: tqdm>=4.48.2 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (4.62.3) Requirement already satisfied: PyYAML>=5.0.0 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (6.0) Requirement already satisfied: missingno>=0.4.2 in c:\users\dell\anaconda3\lib\site-packages (from pandas-profiling) (0.5.0) Requirement already satisfied: attrs>=19.3.0 in c:\users\dell\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.4->pandas-profiling) (21.2.0) Requirement already satisfied: networkx>=2.4 in c:\users\dell\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.4->pandas-profiling) (2.6.3) Requirement already satisfied: imagehash in c:\users\dell\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.4->pandas-profiling) (4.2.1) Requirement already satisfied: Pillow in c:\users\dell\anaconda3\lib\site-packages (from visions[type_image_path]==0.7.4->pandas-profiling) (8.4.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\dell\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (1.3.1) Requirement already satisfied: python-dateutil>=2.7 in c:\users\dell\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (2.8.2) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\dell\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (3.0.4) Requirement already satisfied: cycler>=0.10 in c:\users\dell\anaconda3\lib\site-packages (from matplotlib>=3.2.0->pandas-profiling) (0.10.0) Note: you may need to restart the kernel to use updated packages. Requirement already satisfied: six in c:\users\dell\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=3.2.0->pandas-profiling) (1.16.0) Requirement already satisfied: pytz>=2017.3 in c:\users\dell\anaconda3\lib\site-packages (from pandas!=1.0.0,!=1.0.1,!=1.0.2,!=1.1.0,>=0.25.3->pandas-profiling) (2021.3) Requirement already satisfied: typing-extensions>=3.7.4.3 in c:\users\dell\anaconda3\lib\site-packages (from pydantic>=1.8.1->pandas-profiling) (3.10.0.2) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\dell\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (1.26.7) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\dell\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\dell\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (3.2) Requirement already satisfied: certifi>=2017.4.17 in c:\users\dell\anaconda3\lib\site-packages (from requests>=2.24.0->pandas-profiling) (2021.10.8) Requirement already satisfied: colorama in c:\users\dell\anaconda3\lib\site-packages (from tqdm>=4.48.2->pandas-profiling) (0.4.4) Requirement already satisfied: PyWavelets in c:\users\dell\anaconda3\lib\site-packages (from imagehash->visions[type_image_path]==0.7.4->pandas-profiling) (1.1.1)
from pandas_profiling import ProfileReport
profile = ProfileReport(data, title="Pandas Profiling Report")
profile
print('total duplicate before',data.duplicated().sum())
print('Shape before duplicate',data.shape)
print(data.duplicated().sum())
data.drop_duplicates(inplace = True)
print('shape after droping duplicate',data.shape)
total duplicate before 240 Shape before duplicate (1599, 12) 240 shape after droping duplicate (1359, 12)
data.isnull().sum()
fixed acidity 0 volatile acidity 0 citric acid 0 residual sugar 0 chlorides 0 free sulfur dioxide 0 total sulfur dioxide 0 density 0 pH 0 sulphates 0 alcohol 0 quality 0 dtype: int64
fig,ax = plt.subplots(ncols = 6,nrows = 2 ,figsize = (20,10))
index = 0
ax = ax.flatten()
for cols, value in data.items():
if cols != 'type':
sns.boxplot(y=cols,data = data,ax=ax[index])
index += 1
plt.tight_layout(pad = 0.5,w_pad = 0.7,h_pad = 0.7)
plt.show()
fig, ax = plt.subplots(ncols = 2, nrows = 6, figsize = (20, 20))
index = 0
ax = ax.flatten()
for col, value in data.items():
if col != 'type':
sns.distplot(value, ax = ax[index])
index += 1
plt.tight_layout(pad = 0.5, w_pad = 0.7, h_pad = 5.0)
from scipy.stats import zscore
z = abs(zscore(data))
print(z.shape)
data = data[(z<3).all(axis=1)]
data.shape
(1359, 12)
(1232, 12)
from scipy.stats import skew
for col in data:
print(col)
print(skew(data[col]))
plt.figure()
sns.distplot(data[col])
plt.show()
fixed acidity 0.8061619824138205 volatile acidity 0.43308445717948635 citric acid 0.2759663007760168 residual sugar 2.3718232268318102 chlorides 2.461746675680816 free sulfur dioxide 0.8662329717800913 total sulfur dioxide 1.1611815549035225 density 0.017661782151063662 pH 0.1186075802936275 sulphates 0.9494203726359354 alcohol 0.7664142127820375 quality 0.4022066765498909
data['fixed acidity'] = np.sqrt(data['fixed acidity'])
data['residual sugar'] = np.sqrt(data['residual sugar'])
data['chlorides'] = np.sqrt(data['chlorides'])
data['free sulfur dioxide'] = np.sqrt(data['free sulfur dioxide'])
data['pH'] = np.sqrt(data['pH'])
for col in data:
print(col)
print(skew(data[col]))
fixed acidity 0.5825645881498726 volatile acidity 0.43308445717948635 citric acid 0.2759663007760168 residual sugar 1.732402131415575 chlorides 1.461866045806564 free sulfur dioxide 0.318084567977603 total sulfur dioxide 1.1611815549035225 density 0.017661782151063662 pH 0.055277424173582446 sulphates 0.9494203726359354 alcohol 0.7664142127820375 quality 0.4022066765498909
data['quality_map'] = data['quality'].map({3:'bad',4:"bad",5:'bad',6:'good',7:'good',8:'good'})
data['quality_map'].value_counts().plot(kind='pie',autopct='%1.1f%%')
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'fixed acidity', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'volatile acidity', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'citric acid', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'residual sugar', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'chlorides', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'total sulfur dioxide', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'sulphates', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'alcohol', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.barplot(x = 'quality', y = 'pH', data =data)
plt.show()
features_ = data.columns.values[:-1]
fig = plt.figure(figsize=(16,26))
for column , feature in enumerate(features_):
fig.add_subplot(5,3,column+1)
sns.boxplot(data =data, x='quality',y=feature)
plt.show()
plt.figure(figsize = (15,5))
sns.lineplot(x = 'volatile acidity', y = 'fixed acidity', hue = 'quality_map', data =data)
plt.show()
plt.figure(figsize = (15,5))
sns.lineplot(x = 'alcohol', y = 'residual sugar', hue = 'quality_map', data =data)
plt.show()
sns.pairplot(data)
<seaborn.axisgrid.PairGrid at 0x1af05d1d580>
sns.pairplot(data = data,hue = 'quality')
<seaborn.axisgrid.PairGrid at 0x1af05bd3b20>
corr = data.corr()
plt.figure(figsize = (35,15))
plt.title('Correlation of all the Columns', fontsize = 20)
sns.heatmap(data.corr(), annot = True, vmin = -1, vmax = 1, center = 0, fmt = '.1g', linewidths = 1, linecolor = 'white',
square = True, cmap ='RdBu')
<AxesSubplot:title={'center':'Correlation of all the Columns'}>
corr = data.corr()
corr['quality'].sort_values(ascending=False)
quality 1.000000 alcohol 0.476166 sulphates 0.251397 citric acid 0.226373 fixed acidity 0.124052 residual sugar 0.013732 free sulfur dioxide -0.050656 pH -0.057731 chlorides -0.128907 density -0.174919 total sulfur dioxide -0.185100 volatile acidity -0.390558 Name: quality, dtype: float64
correlations = data.corr()['quality'].drop('quality')
df = pd.read_csv("Wine_Quality_Predictionand_Deployment_TASK_6.csv")
def get_features(correlation_threshold):
abs_corrs = correlations.abs()
high_correlations = abs_corrs[abs_corrs > correlation_threshold].index.values.tolist()
return high_correlations
features = get_features(0.05)
print(features)
x = df[features]
y = df['quality']
['fixed acidity', 'volatile acidity', 'citric acid', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol']
x
| fixed acidity | volatile acidity | citric acid | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7.4 | 0.700 | 0.00 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 |
| 1 | 7.8 | 0.880 | 0.00 | 0.098 | 25.0 | 67.0 | 0.99680 | 3.20 | 0.68 | 9.8 |
| 2 | 7.8 | 0.760 | 0.04 | 0.092 | 15.0 | 54.0 | 0.99700 | 3.26 | 0.65 | 9.8 |
| 3 | 11.2 | 0.280 | 0.56 | 0.075 | 17.0 | 60.0 | 0.99800 | 3.16 | 0.58 | 9.8 |
| 4 | 7.4 | 0.700 | 0.00 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1594 | 6.2 | 0.600 | 0.08 | 0.090 | 32.0 | 44.0 | 0.99490 | 3.45 | 0.58 | 10.5 |
| 1595 | 5.9 | 0.550 | 0.10 | 0.062 | 39.0 | 51.0 | 0.99512 | 3.52 | 0.76 | 11.2 |
| 1596 | 6.3 | 0.510 | 0.13 | 0.076 | 29.0 | 40.0 | 0.99574 | 3.42 | 0.75 | 11.0 |
| 1597 | 5.9 | 0.645 | 0.12 | 0.075 | 32.0 | 44.0 | 0.99547 | 3.57 | 0.71 | 10.2 |
| 1598 | 6.0 | 0.310 | 0.47 | 0.067 | 18.0 | 42.0 | 0.99549 | 3.39 | 0.66 | 11.0 |
1599 rows × 10 columns
y
0 5
1 5
2 5
3 6
4 5
..
1594 5
1595 6
1596 6
1597 5
1598 6
Name: quality, Length: 1599, dtype: int64
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=3)
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
(1199, 10) (1199,) (400, 10) (400,)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# fitting linear regression to training data
regressor = LinearRegression()
regressor.fit(x_train,y_train)
LinearRegression()
# this gives the coefficients of the 10 features selected above.
regressor.coef_
array([ 0.01773723, -0.99256049, -0.13962865, -1.59094279, 0.00559652,
-0.00351973, 0.76859036, -0.43741414, 0.81288805, 0.30148385])
train_pred = regressor.predict(x_train)
train_pred
array([5.33777144, 5.33826411, 5.9503318 , ..., 6.3903182 , 6.19979375,
5.27597259])
# calculating the training and testing accuracies
print("Training accuracy :", regressor.score(x_train, y_train))
print("Testing accuracy :", regressor.score(x_test, y_test))
Training accuracy : 0.3526492871983752 Testing accuracy : 0.37860667718040975
test_pred = regressor.predict(x_test)
test_pred
array([5.10801475, 5.65933623, 5.90407267, 6.13461179, 5.00611866,
5.44514691, 5.05735245, 6.15497513, 5.51919603, 5.77259374,
5.61809366, 5.23616173, 5.23544213, 5.31968644, 6.47007277,
5.043404 , 5.85287121, 5.19427909, 6.07727089, 6.34949018,
6.42525555, 5.51221957, 5.8030796 , 4.93637817, 5.16618356,
5.48255293, 5.13758624, 6.60000969, 5.88754763, 5.74133915,
6.09716961, 6.29379754, 4.91269821, 5.88611904, 5.11007273,
5.94574773, 6.80685536, 5.04305653, 5.25438683, 5.88611904,
5.17406542, 4.84008442, 6.48781656, 5.40521715, 5.31105571,
5.84484462, 5.7100681 , 5.24300809, 5.25021217, 5.46398911,
5.08740494, 5.61369555, 6.01375792, 6.32497377, 5.47511954,
5.36466869, 5.09234555, 4.92625623, 5.21415941, 5.08274744,
4.79570013, 5.4377645 , 5.25237771, 5.68830391, 5.85145609,
6.52420079, 5.38691412, 5.71775637, 5.17641417, 5.99156845,
5.6445189 , 5.60892012, 5.74967567, 5.21702288, 5.97975854,
5.51115845, 5.41121547, 5.6832459 , 5.63971524, 5.74133915,
6.24163428, 5.27915822, 4.66596769, 6.04951743, 5.52401618,
5.17823915, 5.20672986, 5.96322663, 5.50411353, 5.64866275,
5.70105618, 5.6431575 , 5.72586828, 5.3173125 , 5.37075392,
5.394889 , 4.82061159, 5.46006525, 5.47363879, 6.54074801,
6.13723937, 5.61422461, 6.07821503, 6.17461539, 5.73230665,
4.92692198, 4.73317591, 5.03851027, 5.44868797, 5.78432759,
6.46608259, 5.47530673, 6.46876056, 5.94466642, 5.43257493,
5.20523855, 5.34551741, 5.20749557, 6.19344578, 5.61453943,
5.83308923, 5.20267759, 5.17702922, 5.26912156, 5.74382704,
5.6431575 , 6.15450941, 5.89677877, 5.49186029, 5.39047629,
5.25848318, 5.41150099, 5.70750135, 5.68001376, 6.58288921,
5.89497164, 6.37172338, 5.72945992, 5.37936908, 5.14371952,
5.58851063, 6.59661777, 5.24403336, 5.25594627, 5.54721935,
5.17243958, 5.76990082, 6.10847777, 6.93985005, 4.99562031,
5.01958735, 4.68547026, 5.82434616, 5.01708671, 5.21702288,
5.70819215, 5.63334181, 5.33481542, 5.22220103, 5.84327644,
5.61823023, 5.78078643, 5.51830906, 6.03898671, 5.63808482,
5.49193476, 5.96787582, 4.82363567, 5.26331016, 5.6625652 ,
5.73510278, 6.59570394, 5.02584187, 5.9062506 , 5.85381667,
5.21140744, 5.68951564, 5.51649995, 5.40521715, 6.3797489 ,
6.71642336, 4.98858413, 5.88413601, 5.75553621, 5.75508093,
5.61677128, 5.71318209, 5.40944809, 6.05634078, 5.58276397,
5.88366084, 6.51928901, 5.00668628, 5.4022397 , 5.18103224,
5.17641417, 5.47511954, 5.7498414 , 5.69035164, 4.92770738,
5.12908401, 4.98756458, 6.18242395, 5.65384546, 5.45261211,
5.56461849, 4.99367423, 5.8451887 , 5.31537513, 5.48096741,
5.69721582, 5.63441998, 5.69158339, 5.82596777, 5.79120742,
6.02976291, 6.20119324, 5.2711036 , 5.04432824, 5.21865445,
5.38560996, 4.97814926, 6.21495495, 5.44287205, 5.94855371,
5.21554859, 6.61250215, 5.08358009, 5.29397027, 5.0345619 ,
6.16924449, 5.78078489, 4.85807345, 5.74347603, 5.29957538,
5.35504601, 5.17674741, 6.30056659, 5.58203663, 4.95859588,
6.10234044, 6.03135183, 6.16235396, 5.41939397, 6.76220041,
6.20905743, 6.08858508, 5.2295158 , 5.45339068, 5.54357835,
5.35504601, 5.24163328, 5.74950618, 5.25054347, 6.1317778 ,
5.42684978, 5.84929436, 4.82101128, 6.06442799, 5.06580635,
6.43296901, 6.06017586, 5.69914195, 5.70750135, 4.90849494,
6.00470504, 5.28204515, 5.70851728, 5.42245606, 5.12401141,
6.4859156 , 5.3065303 , 5.97368396, 5.64309435, 6.49134579,
6.20544121, 5.09817253, 5.47044255, 5.30164692, 5.24148223,
6.37172338, 5.38111102, 5.41238036, 6.00040309, 4.98474017,
5.89003832, 5.35695446, 5.18204386, 5.43418226, 5.92040958,
4.83001281, 6.84476407, 5.17004942, 4.90738182, 5.71686848,
5.6789345 , 5.30006437, 6.28821696, 6.8852049 , 6.5809374 ,
5.94544069, 6.3372236 , 5.90511355, 5.56832499, 6.0037674 ,
5.51592572, 5.47416032, 5.73230665, 5.31523101, 5.15200739,
6.22058683, 5.30080495, 6.2234137 , 6.09823805, 5.86965547,
5.42671619, 4.83521332, 6.04105625, 5.17954177, 5.11991572,
6.45483786, 5.50117269, 6.75189479, 5.112668 , 5.16698378,
5.30775351, 5.71062442, 5.10611189, 5.54132974, 5.3123985 ,
5.16102307, 4.95778371, 5.4565424 , 5.38281735, 5.34456446,
5.18064244, 6.11817384, 5.624629 , 5.7335454 , 6.34845774,
5.90187012, 5.51649995, 5.69382597, 5.14872878, 5.70431206,
6.40518994, 6.17170859, 5.46670957, 6.06382552, 5.69529022,
6.24040985, 5.24148223, 5.5445765 , 5.04703379, 4.99563993,
5.07398398, 5.81020153, 5.40277302, 5.9958201 , 5.1304824 ,
6.51949038, 5.39875246, 5.49117974, 5.64759732, 5.52711217,
5.27288466, 6.49134579, 5.83141585, 5.03473182, 5.24121201,
5.49075176, 5.27342775, 5.5118796 , 5.04396982, 5.29167137,
5.46007816, 5.36402691, 6.12719352, 4.99200331, 5.30164692,
6.08655307, 5.20523855, 5.13058231, 4.66155684, 6.15823093,
6.15917286, 6.50582017, 5.80335212, 5.74347603, 6.39828821,
6.14436092, 5.88754763, 6.05466721, 6.03231119, 5.36683868,
5.41989769, 5.61706715, 5.4057693 , 5.76283208, 5.2734642 ])
# rounding off the predicted values for test set
predicted_data = np.round_(test_pred)
predicted_data
array([5., 6., 6., 6., 5., 5., 5., 6., 6., 6., 6., 5., 5., 5., 6., 5., 6.,
5., 6., 6., 6., 6., 6., 5., 5., 5., 5., 7., 6., 6., 6., 6., 5., 6.,
5., 6., 7., 5., 5., 6., 5., 5., 6., 5., 5., 6., 6., 5., 5., 5., 5.,
6., 6., 6., 5., 5., 5., 5., 5., 5., 5., 5., 5., 6., 6., 7., 5., 6.,
5., 6., 6., 6., 6., 5., 6., 6., 5., 6., 6., 6., 6., 5., 5., 6., 6.,
5., 5., 6., 6., 6., 6., 6., 6., 5., 5., 5., 5., 5., 5., 7., 6., 6.,
6., 6., 6., 5., 5., 5., 5., 6., 6., 5., 6., 6., 5., 5., 5., 5., 6.,
6., 6., 5., 5., 5., 6., 6., 6., 6., 5., 5., 5., 5., 6., 6., 7., 6.,
6., 6., 5., 5., 6., 7., 5., 5., 6., 5., 6., 6., 7., 5., 5., 5., 6.,
5., 5., 6., 6., 5., 5., 6., 6., 6., 6., 6., 6., 5., 6., 5., 5., 6.,
6., 7., 5., 6., 6., 5., 6., 6., 5., 6., 7., 5., 6., 6., 6., 6., 6.,
5., 6., 6., 6., 7., 5., 5., 5., 5., 5., 6., 6., 5., 5., 5., 6., 6.,
5., 6., 5., 6., 5., 5., 6., 6., 6., 6., 6., 6., 6., 5., 5., 5., 5.,
5., 6., 5., 6., 5., 7., 5., 5., 5., 6., 6., 5., 6., 5., 5., 5., 6.,
6., 5., 6., 6., 6., 5., 7., 6., 6., 5., 5., 6., 5., 5., 6., 5., 6.,
5., 6., 5., 6., 5., 6., 6., 6., 6., 5., 6., 5., 6., 5., 5., 6., 5.,
6., 6., 6., 6., 5., 5., 5., 5., 6., 5., 5., 6., 5., 6., 5., 5., 5.,
6., 5., 7., 5., 5., 6., 6., 5., 6., 7., 7., 6., 6., 6., 6., 6., 6.,
5., 6., 5., 5., 6., 5., 6., 6., 6., 5., 5., 6., 5., 5., 6., 6., 7.,
5., 5., 5., 6., 5., 6., 5., 5., 5., 5., 5., 5., 5., 6., 6., 6., 6.,
6., 6., 6., 5., 6., 6., 6., 5., 6., 6., 6., 5., 6., 5., 5., 5., 6.,
5., 6., 5., 7., 5., 5., 6., 6., 5., 6., 6., 5., 5., 5., 5., 6., 5.,
5., 5., 5., 6., 5., 5., 6., 5., 5., 5., 6., 6., 7., 6., 6., 6., 6.,
6., 6., 6., 5., 5., 6., 5., 6., 5.])
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, test_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, test_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, test_pred)))
Mean Absolute Error: 0.48443407559851015 Mean Squared Error: 0.39380413462864783 Root Mean Squared Error: 0.6275381539226502
coeffecients = pd.DataFrame(regressor.coef_,features)
coeffecients.columns = ['Coeffecient']
coeffecients
| Coeffecient | |
|---|---|
| fixed acidity | 0.017737 |
| volatile acidity | -0.992560 |
| citric acid | -0.139629 |
| chlorides | -1.590943 |
| free sulfur dioxide | 0.005597 |
| total sulfur dioxide | -0.003520 |
| density | 0.768590 |
| pH | -0.437414 |
| sulphates | 0.812888 |
| alcohol | 0.301484 |
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.metrics import confusion_matrix, precision_score, recall_score, auc,roc_curve,f1_score,classification_report, plot_confusion_matrix
# creating the model
model = LogisticRegression()
# feeding the training set into the model
model.fit(x_train, y_train)
# predicting the results for the test set
y_pred = model.predict(x_test)
# calculating the training and testing accuracies
print("Training accuracy :", model.score(x_train, y_train))
print("Testing accuracy :", model.score(x_test, y_test))
# Print accuracy scores
print(f'Model accuracy score: {round(accuracy_score(y_test, y_pred) * 100, 2)}%')
# classification report
print(classification_report(y_test, y_pred))
Training accuracy : 0.5829858215179317
Testing accuracy : 0.57
Model accuracy score: 57.0%
precision recall f1-score support
3 0.00 0.00 0.00 3
4 0.00 0.00 0.00 14
5 0.61 0.77 0.68 166
6 0.53 0.61 0.57 166
7 0.00 0.00 0.00 48
8 0.00 0.00 0.00 3
accuracy 0.57 400
macro avg 0.19 0.23 0.21 400
weighted avg 0.47 0.57 0.52 400
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,classification_report
modelr = RandomForestClassifier()
modelr.fit(x_train,y_train)
RandomForestClassifier()
x_test_prediction = modelr.predict(x_test)
test_data_accuracy = accuracy_score(x_test_prediction,y_test)
print('Accuracy :',test_data_accuracy)
Accuracy : 0.72
print(classification_report(y_test,x_test_prediction))
precision recall f1-score support
3 0.00 0.00 0.00 3
4 0.00 0.00 0.00 14
5 0.74 0.85 0.79 166
6 0.71 0.73 0.72 166
7 0.69 0.52 0.60 48
8 0.33 0.33 0.33 3
accuracy 0.72 400
macro avg 0.41 0.41 0.41 400
weighted avg 0.69 0.72 0.70 400
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
accuracy = []
for i in range (1,31):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(x_train,y_train)
ypred=knn.predict(x_test)
ac=accuracy_score(y_test,ypred)
accuracy.append(ac)
plt.plot(range(1,31),accuracy)
plt.grid(True)
plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
knn = KNeighborsClassifier(n_neighbors=15)
knn.fit(x_train,y_train)
train_pred = knn.predict(x_train)
test_pred = knn.predict(x_test)
print('Training Accuracy :',accuracy_score(y_train,train_pred))
print('Test Accuracy :',accuracy_score(y_test,test_pred))
print('Confusion Matrix : \n',confusion_matrix(y_test,test_pred))
Training Accuracy : 0.5904920767306089 Test Accuracy : 0.49 Confusion Matrix : [[ 0 0 1 2 0 0] [ 0 0 6 8 0 0] [ 0 1 109 56 0 0] [ 0 0 83 80 3 0] [ 0 0 14 27 7 0] [ 0 0 2 1 0 0]]
print(classification_report(y_test,y_pred))
precision recall f1-score support
3 0.00 0.00 0.00 3
4 0.00 0.00 0.00 14
5 0.59 0.44 0.51 166
6 0.46 0.77 0.57 166
7 1.00 0.02 0.04 48
8 0.00 0.00 0.00 3
accuracy 0.50 400
macro avg 0.34 0.20 0.19 400
weighted avg 0.56 0.50 0.45 400
import pickle
pickle.dump(modelr,open('wine_modelr.pkl','wb'))
model = pickle.load(open('wine_modelr.pkl','rb'))
result = model.score(x_test,y_test)
print(result)
0.72